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Multiple hypothesis testing often involves composite nulls, i.e., 
nulls that are associated with two or more distributions. In many 
cases, it is reasonable to assume that there is a prior distribution 
on the distributions despite it is unknown. When the number of dis- 
tributions under true nulls is finite, we show that under the above 
assumption, the false discover rate (FDR) can be controlled using 
p- values computed under constraints imposed by the empirical distri- 
bution of the observations. Comparing to FDR control using p- values 
defined as maximum significance level over all null distributions, the 
proposed FDR control can have substantially more power. 

1. Introduction. In hypothesis testing, a relatively simple case is where 
the data associated with true nulls and those with false nulls each follow a 
common distribution ("simple versus simple") [4, (3]. On the other hand, 
in many cases, either the data associated with true nulls follow different 
distributions ("composite nulls") or those associated with false nulls follow 
different distributions ("composite alternatives"). In the current literature 
on multiple testing, once appropriate test statistics such as p- values are com- 
puted, testing procedures based on the statistics usually do not distinguish 
between the simple and composite cases [11, 10, 16, 7, 14]. At the time when 
a procedure is applied, it only has the test statistics available. For this rea- 
son, how the test statistics are defined plays an important role in the overall 
performance of the procedure. 

For composite nulls, p-values are usually defined as maximum probabil- 
ities over all null distributions [10]. Following the random-effects extension 
for composite alternatives [G], a Bayesian approach to calculating p- values 
can be used. Specifically, one assumes that there is a known prior distri- 
bution on the null distributions. Since the overall distribution of the data 
associated with true nulls can now be determined by an integral of the null 
distributions weighted by the prior, the composite case is essentially reduced 
to the simple one. 

The focus of the article lies between the above two approaches. The un- 
derlying premise is that there is a prior distribution on the null distributions, 
however, the prior is unknown. The basic observation is that, in the presence 
of a large number of nulls, the empirical distribution of the data provides 
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useful information on the prior. More specifically, the mixture of the null 
distributions, if multiplied by the population fraction of true nulls, is dom- 
inated by the empirical distribution of the data plus a small margin. This 
constrains the set of possible priors. We shall explore the observation for 
the case where there are only a finite number of null distributions. On the 
one hand, the p-values will be calculated as maximum probabilities. On the 
other, the maximization is over a range of linear combinations of the null dis- 
tributions, with the coefficients being constrained. As a result, the p-values 
can be computed by linear programming. 

The article does not consider the case of composite alternatives. The 
position here is that, since oftentimes no information on the distributions 
under false nulls is available, it is sensible to regard data associated with 
false nulls as being sampled from a single overall distribution. 

Although our focus is the evaluation of p- values under constraints, we 
start with Section 2 on FDR control using maximum probabilities without 
constraints. That the BH procedure can control the FDR in this case is 
known [3]. The purpose of the section is to setup suitable framework for 
following sections, by making a more general description of the BH procedure 
and indicating where constrained maximization may be introduced. 

Section 3 considers two ways to compute p-values. The first one is se- 
quential, such that the p-value of each observation is obtained under linear 
constraints imposed by observations whose p- values have already been com- 
puted. In the second one, in principle, the p-values can be computed for 
the observations simultaneously under the linear constraints imposed by the 
entire data. Both types of p- values are then processed by the BH procedure. 
Analytically, it is easier to establish FDR control based on the first type of 
p-values because the sequential computation allows one to use a stopping 
time argument [15]. On the other hand, since there are more constraints 
imposed on the second type of p-values, presumably they may lead to more 
improvement in multiple testing. However, the simulation study reported in 
Section 4 indicates that the two types of p- values lead to similar performance 
of multiple testing. Some possible explanations for this will be given at the 
end of Section 4. The study shows that, the BH procedure is substantially 
more powerful when using the two types of p-values than using p-values 
computed by the usual unconstrained maximization. In addition to power, 
we will also compare the FDR and positive FDR (pFDR) realized by the 
p-values. 

The results in Section 4 indicate that in general, for the case of composite 
nulls, the prior on the null distributions cannot be estimated consistently. 
Basically, this is because the constraints imposed by the data cannot yield 
exact details of the prior and also because the above two ways to evaluate 
p-values usually select different linear combinations of the null distributions 
for different observations. This is in contrast to the simple case, where the 
fraction of true nulls can be estimated consistently [2, 8, 15]. Conceptually 
it is of interest to ask whether there are conditions that allow the prior of 
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the null distributions to be estimated consistently. In Section 5, for the case 
where there are only a finite number of null distributions, a necessary and 
sufficient condition will be given for the consistent estimation of the prior 
using maximum likelihood estimation (MLE). Note that, in the MLE, the 
distribution under false nulls is unknown, and the data are treated as though 
all are sampled from true nulls. An example will be given to show that for 
any finite set of linearly independent null distributions, one can construct a 
large class of distributions that satisfy the condition. 

Section 6 contains a brief discussion. Most technical details are collected 
in the Appendix. 

1.1. Assumptions and notation. Let {Fg, 8 G 0} be a family of distri- 
butions on Mr. Given random observations X\, . . . , X n G M. d , the composite 
nulls to be tested are 

Hi : Xi ~ Fg for some 9 G 0. 

Each Fg is a null distribution. 

Our discussion will be under the following random mixture model. The 
distribution under false nulls is G G" {Fg, 8 G 0} and the fraction of false 
nulls among all nulls is a G (0,1). There is a prior probability measure 
v on 0. The data are sampled as follows. Define probability measure \i 
on U {*}, where * is any element not in 0, such that = a and 

fjL(A) = (1 — a)v(A) for A C 0. Sample rji,... ,rj n iid ~ //. If rji = *, then 
sample Xi ~ 67; otherwise, sample Xi ~ F Vi . Thus rji can be thought of as 
the identity of Xi, indexing the distribution Xi is sampled from. 

Throughout we will make two assumptions. First, v is unknown. Indeed, if 
v is known, then under true Hi, Xi ~ F = J Fgv{d8) and thus the composite 
null can be reduced to a simple null. Second, G is unknown. This assumption 
is especially intended for the case where is finite. Indeed, if G is known, 
then for n S> 1, both a and v can be estimated accurately by the MLE, 
which reduces the testing problem into one only involving simple nulls. 

Recall that for a multiple testing procedure, if R is the number of rejected 
nulls, and V that of rejected true nulls, then 



FDR = E 



V 



, pFDR = E 



R>0 



RV1. 

Furthermore, if there are n nulls and N of them are true, then 

jt\ R ~ V 
power = hj : 

[(n-iV)Vl 

2. Testing based on maximum probabilities. Usually, a descrip- 
tion of multiple testing procedure starts with p- values, treating them as 
already available. For our discussion later, it is useful to start with how p- 
values are computed. The p-values are absent in the continuous version of 
our description, but explicit in the discrete version. 
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Let {Dt : t G X} be a family of Borel sets in M. d satisfying the following 
conditions, where X 7^ is an open interval in R. 

Dl. The family is increasing and right-continuous, i.e. Dt = C\ s>t s£ jD s , 

for t G X. 
D2. U te xA = K d - 

D3. G(a ex A) = ^(nteT A) = 0, 9 G G. 
For each 6 G 0, define 

|>„(A) if* ex, 

(2.1) fl (t) = Jo ifi<infX, 

[l if t > supX, 

i.e., <fio(t) is the significance level of the region Dt under Fg. By D2 and D3, 
4>e is nondecreasing and continuous at infX and supX. Denote 

(2.2) M(t) = sup^(t), 

i.e., M(t) is the significance level of Dt associated with {Fg, 8 G 0}. It is 
nondecreasing with M{t) = for t < inf X and M(t) = 1 for f > supX. 

We can regard M(t) as sup^J <t>g{t) dfi(9), where the supremum is taken 
over all possible probability measures fi on 0. By our assumption, there 
is a prior v on 0. If there is no information on the value of then the 
supremum is justified. If, on the other hand, it is known that v satisfies 
certain conditions, then it makes sense to use the conditions to constrain 
the supremum, even though the conditions may not uniquely determine v. 
This may yield a M(t) closer to J 4>g(t) du{9) that improves the performance 
of multiple testing. 

Once M(t) are in place, the BH procedure can be applied. The procedure 
can be described in two ways. The continuous version features a stopping 
time that may simplify the analysis of FDR control (cf. [15]), while the 
discrete one is easier to implement. For t G X, denote 

n n 

Rn(t) = £ 1 {X t G D t } , V n (t) = ^ 1 {Xi G D t , rji G 0} . 

i=l i=l 

Procedure 2.1 (Continuous version). Given control parameter a G 
(0,1), let 

Ian 

If X# ytz ; set r = inf X^ and reject Hi if and only if Xi G D T . Otherwise, 
set r = infX and accept all Hi. □ 

To describe the discrete version of Procedure 2.1, define 

(2.3) s(x) = inf{t G X : x G D t }, s, = s(Xj), i = I, . . . ,n. 

By D2, the set in (2.3) is nonempty, so s(x) is well-defined and s(x) < supX. 
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Proposition 2.1. Under Dl-3, the following statements hold. 

1) si El almost surely. 

2) For any t £ I, Si < t <*=>■ Xi G D t and hence R n (t) = J2 1 { s i 

3) Given 6, if Xi ~ Fg, then Sj ~ <f>$. 

4) For i = 1, . . . , n, the distribution function of Sj is 

Q(t) = (1 - a) J <f> e (t) v(d9) + aG{D t ). 

5) If (pg G C(R) /or a// # ; i/ien M(t) is left- continuous. 

By Proposition 2.1, (fio(si) is the p- value of X, under Fg. Therefore, M(sj) 
can be used as a p- value under the composite null Hi [10]. 

Procedure 2.2 (Discrete version). Let sm < ... < S( n ) be the order 
statistics of Si and S( ) = infX. Reject Hi if and only Sj < S(_r), where 

R = max I « > : M ^ 8 ^ < - 1 . □ 
a n \ 

Proposition 2.2. Suppose 4>q G C(R) /or a// 0. T/ien Procedures 2.1 
and 2.2 are the same, and both have FDR < (1 — a)a. 

In single hypothesis tests, nested rejection regions are usually indexed by 
significance level. For FDR control, other indices can be used. This allows 
one to think about the rejection regions in more natural terms and also 
avoids problems when different regions have the same significance levels. 

Example 2.1. Suppose Xi G M. To use lower-tail probabilities as p- 
values, set D t = (— oo, i], t G X = K. Then Si = Xi and 4>e{si) = Fg(Xi). 
To use upper-tail probabilities as p-values, set D t = [— t,oo), t G X = R. 
Then Sj = — Xj and 4>g(si) = Fq([— Sj, oo)) = Xe([Xj,oo)). Suppose each Xg 
is continuous at 0. If we use Dt = [—t, t], t G X = [0, oo), then Sj = \Xi\ and 
^( Si )=X([-|X i |,|X i |]). □ 

3. Testing based on constrained maximum probabilities. 

3.1. Outlines. Testing using maximum probabilities can be very con- 
servative. Our goal is to find alternative methods when O is a finite set 
{9k, k = 1, . . . , L}. The probability measure i/on6 can now be specified by 
v = (yx, . . . , ul) t with Vk = u({9k}). Henceforth, a letter in boldface will 
stand for an L-dimensional vector. Denote ^fc(t) = 4>g k (t). In this section, 
we assume that all Fk and hence all (frk(t) are continuous. Denote 

F n (t) = Rn(t)/n, 

i.e. the empirical distribution based on s±, . . . , s n defined in (2.3). 
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Instead of M(t) = maxfc0fc(i) as in Procedure 2.1, for finite G, the pro- 
posed functions to use have the general form 

M n (t) = sup{c T 0(t) : c G C, c T cj>£ An, t }, 

where C is a suitable subset of 

A = {c € [0, 1] L : ci + --- + c L < 1} 

and for n > 1 and t £ I, An,t is a family of functions on X. In general, C 
is constructed based on deterministic knowledge on u and a. On the other 
hand, A n j is constructed based on the data and hence both M n {t) and A n ,t 
may be random. If C = A and -4 n ,t is the entire family of functions on I, then 
M n (t) is maxj and we recover Procedure 2.1. By adding conditions to 
make C or A n< t smaller, M n {t) can be smaller than maxj(/>j(t), which may 
result in higher power. In particular, if C = {v}, then M n (t) = u T c/)(t), 
which reduces the testing problem to the one for simple nulls. 

Oftentimes, there is no direct knowledge on v or a so one has to set C = A; 
constraints on c are indirectly imposed through the condition <p G An,t' 
Then M n (t) takes the form 

(3.1) M n (t) = sup{c T (f>(t) : c G A, c T (f> G A n , t }. 

In Section 4, we will consider the case where C can be chosen smaller than 
A, and in Section 5, a case where substantial knowledge on v can be attained 
by estimation will be considered. 

Recall that (1 — a)v T (f)(t) is the population fraction of true nulls with 
Xi G Df. In order for M n (t) not to underestimate the fraction, a basic 
requirement is M n (t) > (1 — a)is J (j>(t). In general, since A n j is random, this 
requires that A n ^ have the property that as long as n is large enough, with 
probability close to 1, (1 — a)v T <fi G A n: t for all t G X. 

A basic fact to use in order to satisfy the condition is that, almost surely, 
as n — > oo, 

sup|F n (t) -Q(t)| -0, 
t 

where Q(t) is the distribution function of S{ = s(Xi) defined in (2.3), i.e. 
Q(t) = (1- a)v T (f>(t) + aG(D t ). 

Then, with probability close to 1, (1 — a)v T (f) is less than ¥ n (t) plus a small 
margin. Moreover, Q(t) - (1 - a)v T (j){t) = aG{D t ) is increasing in t. Then 
for n>l, with probability close to 1, 

¥ n (u) - (1 - a)u T 4>(u) > ¥ n (v) - (1 - a)u T 4>(v) - e n , for all u > v. 

Therefore, in calculating M n (t), the maximization can be constrained to 
those c such that, when they replace (1 — a)v, the inequalities still hold. 
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3.2. Construction using data sequentially. Given the relative ease to es- 
tablish FDR control by using a stopping time as the random cut-off for 
rejection, we first consider a construction of A n ± that allows a stopping 
time to be defined. 

Incorporating the facts discussed just now, a basic form of An t is 



(3.2) A n ,t = < 



{ h G C{X) : h{ Si ) < ¥ n { Si ) + e n for 8i > t ) 
F„(t 2 ) - F n (ti) > h{t 2 ) - h(h) - e n for 
tl, t2 £ %i with t < t\ < t 2 



where T n C I is a finite set of points. Although T„ can contain any number 
of points, to reduce computation, the number of points in T n needs to be 
relatively small. 

It is easy to see M n (t) = if t < infX. Some other useful properties of 
M n (t) are as follows. 

Lemma 3.1. M n is always nondecreasing. Furthermore, if 4>i G C(R) for 
all i, then almost surely, 1 ) M n is continuous at every t other than s\, . . . , s n 
and 2) it is left- continuous and has a right-hand limit at each Sj. 

The continuous and discrete versions of the BH procedure using M n (t) are 
described below. Similar to Procedure 2.2, the two versions are equivalent. 
As in Procedure 2.1, the random variable r in the continuous version is a 
stopping time. 

Procedure 3.1. Given control parameter a G (0, 1), let 

{an 

If Ir ^ 0, set r = supTfl and reject Hi if and only if Sj < r. Otherwise, set 
r = inf X and accept all Hi. 

Equivalently, sort Si into sm < • • • < S( n ) an d set S( ) = inf X. Reject Hi 
if and only if Sj < S(_r), where 

K = max <i > : < — > . □ 

a n 

For each i, M n (s^) is the maximum of c T (f){s^), with Ck satisfying 

1) c k > 0, Ecfc < l; 

2) c T 0(s (i) ) < F n (s (j) ) + e n for j > i; 

3) F n (t 2 ) - F n (ti) > Efc=iCfc[^(i 2 ) - + en for h,t 2 G T„ with 

< h < t 2 . 

All the constraints are linear. As a result, M n (su\) can be computed by 
linear programming. The computation is termed sequential because each 
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M n (s(j)) is computed based on the data greater than sm. Therefore, if we 
imagine that are input one by one, starting with the largest one, then 
M n (s(i)) can be computed only after all su\, j > i, have been input. 

The FDR control of Procedure 3.1 is given in the next result. The main 
tool for the proof is martingale stopping time and the Dvoretzky-Kiefer- 
Wolfowitz (DKW) inequality [12]. 

Theorem 3.1. Suppose 1) fa G C(R), 2) v T cf)(t) > for all t G X and 
3) G(Dt) in continuous in t. Then for n > 1, provided exp(— 2ne n ) < 1/2, 
Procedure 3.1 satisfies 



FDR < a + 2(1 + \T n \) exp(-2ne^) + E 



1{R > 0} 



.RV 1 



The bound contains terms in addition to a. For appropriate e n and T n , 
the term 2(1 + \T n \) exp(— 2ne n ) is o(l) as n — > oo. Under certain condi- 
tions, i? is of the same order as n and hence the bound shows FDR can be 
asymptotically controlled at a. However, the simulation study in Section 4 
indicates that usually the realized FDR is substantially lower than a, which 
is reasonable because M n (t) is an overestimation of (1 — a)u T cf>(t). 

3.3. Construction using entire data. In place of An,t which depends on 
t, we can use a single family of functions A n . In order to impose maximum 
amount of linear constraints, A n should incorporate all X{. Based on the 
same considerations underlying (3.2), we define 

r h G C(J) : h(si) < ¥ n ( Si ) + e n for all Si 

(3.3) An = \ F n (i 2 ) - F n (ix) > h(t 2 ) - h(h) - e n 

{ for t\, ti G T n with t\ < t 2 

Corresponding to (3.1), for t G X, define 

(3.4) M n (t) = sup \c T 4>(t) : c G A, c T G A n } . 

It is easy to see that M n is nondecreasing. Therefore, corresponding to Pro- 
cedure 3.1, the following BH procedure obtains. 

Procedure 3.2. Given control parameter a G (0, 1), let 

r J* c t MJf) RJt)yi 

Xr = sup uGI: < 

[an 

If ^ 0, set r = supT^ and reject Hi if and only if s, < r. Otherwise, set 
t = infX accept all H{. 

Equivalently, sort s, into sm < . • • < S( n ) and set S( ) = infX. Reject 
if and only if Sj < S(jj), where 

R J-^n ^n(g W )Vl l 

it = max o > : < Lj: > . □ 

a n 
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Like Procedure 3.1, M n (s^) can be computed using linear programming. 
For comparision, we list the constraints for the maximization. For each i, 
M n (s(j)) is the maximum of c T 4>(s^), with satisfying 

1) c fc > 0, £c fc < i ; 

2) c T 0(s (j )) < F n (s (j -)) + e n for all j = 1, . . . , n. 

3) Fn(t 2 )-F„(ii) >Efe=iCfc[^(t2)-^fc(ti)] + e n for allti,t 2 G T„ with 

tl < *2- 

It is worth pointing out that although the set of constraints on c is the same 
for all Si, for different i, because </>(sj) are different, the value of c that yields 
M n (si) will be different. 

Unlike Procedures 2.1 and 3.1, since r in Procedure 3.2 is determined 
by the entire s\, . . . , s n , it is not a stopping time. Because the martingale 
stopping time argument cannot be used to establish FDR control for finite 
n, we will work out an asymptotic statement instead. 

For s G R and 5 C R, denote the distance from s to S by d(s,S) = 
M{\s -t\ : t G S}. Define S(S, T) = sup{d{s,S) : s G T} for S, T C R. 
A sequence S n of finite sets is said to be increasingly dense in T if for any 
r > 0, 5(S n , T n [— r, r]) — ► as n — ► oo. 

Theorem 3.2. Suppose 1) all 4>i are continuous and c T 4> is strictly 
increasing in X 2) G{Dt) is continuous in t, and 3) as n — > oo, e n — ► 0, 
ne^ — ► oo and 7^ is increasingly dense in X. Then, under Assumption A 
given below, for Procedure 3.2, lirm^oo FDR < a. 

Furthermore, asymptotically the procedure is equivalent to the one that 
reject Hi if and only if Si < i*, where t* is defined in (3.6) below. 

Intuitively, as n — > oo, in certain sense A n should tend to A = {h G C(X): 
Q — h > is nondecreasing} . Consequently, M n (t) should tend to 

(3.5) m(t) = sup{c T cj>(t) : c G A, c T G .A}. 

If this is true, then, as in [6], the asymptotic of FDR as n — > oo may be 
characterize by a fixed point derived from m(t) and Q(t). Let 

(3.6) U = sup{t £X : m(t) < aQ(t)} . 

Assumption A. i* G X and there is to < t*, such that m(t) < a>Q(t) 
on (io,t«). 

4. Numerical study. 

4.1. Setup. Because the properties of M n {t) in (3.1) and (3.4) are hard 
to keep track of, it is difficult to analyze the power and pFDR of Procedures 
3.1 and 3.2. We resort to numerical simulations to get a handle to these two 
quantities. For comparison, Procedure 2.1 and the BH procedure with the 
prior probabilities v\, . . . , Vl being known are also included. 
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We only consider univariate observations. To use lower-tail p-values, we 
set D t = (—00, t]. By (2.3), if an observation X takes value x, then s(X) = x 
and hence <j>k(s(X)) = Fk(x), the left-tail p- value of X under F%. Also, 
given observations X±, . . . ,X n , from R n (t) = Ya=i 1 £ D t }, R n (Xi) is 
the rank of Aj. 

In each simulation, we draw iid samples X\, . . . ,X n from a mixture dis- 
tribution (1 — a) Ylk=i v kFk(x) + aG(x), where G ^ F\, . . . , F^. To test nulls 



we compute four types of p- values: 

1) Pi,scq = M n (Xi) defined by (3.1) and (3.2), where "seq" in the subscript 
stands for "sequential", indicating that as the calculation of M n {Xi) 
precedes to smaller Xj, linear constraints are added sequentially; 

2) Pi,g\b = M n (Xi) defined by (3.3) and (3.4), where "gib" in the subscript 
stands for "global", indicating that M n (JQ) are calculated under linear 
constraints imposed by all X\, . . . , X n ; 

3) Pi,max = max?; F k (Xi); 

4) K,mix = J2k u kFk(Xi), i.e., the p-value of Xi when the values of 
. . . vj_, are known. 

The computation of Pi jSeq and Pi tg ib is done by linear programming. By 
(3.1) and (3.4), both are maxima of c T F{Xi) = ciFipQ) + • • • + c L F L (Xi). 
In the simulations, the constraints are a little different from those basic ones 
given in (3.2) and (3.3). However, the analysis is the same. 

Denote by T*(z;at,/3) the z-th. upper-tail quantile of the Gamma distri- 
bution with shape parameter a and scale parameter (5. For i = 1, . . . , n, to 
compute Pi, seq , the constraints on c±, . . . ,cl are 



3) F n (i 2 )-F n (ii) > c T [F{t 2 ) -F(h)} + e n for h,t 2 G T n with X i <t 1 < 
i 2 , where ¥ n (t) = R n (t)/n. 

In all the simulations, e n = y/\nn/n and T n consists of L(hrn) 2 J equally 
spaced points with the first and last ones being minXj and maxXj. 

The only difference between the above constraints and those in (3.2) is 
the modified upper bound u(Xj) when R n (Xj) < n 0,2 . This aims to impose 
stronger constraint on c^. In the definition of u(Xj), n 0,2 can be changed to 
any a n = o(n) and the scale parameter 1/0.95 to any 1/(5 with (5 £ (0, 1). 
As Appendix A. 4 shows, at control parameter a, Procedure 3.1 using Pi tSeq 



Hi : Xi ~ Fk for some k, i = 1, . . . , n 



1) c k > 0, J2ck < l; 

2) c T F(Aj) < u{Xj) for Xj > Xi, where 
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computed under the above constraints obtains 

1{R > 0} 



(4.1) 



FDR <a + r n + E 



RV1 



1, . . . , n, to compute Pi !g \b, the 



with r n — ► as n — > oo. 

With similar modifications to (3.3), for i 
constrains on c%, . . . ,cl are 

1) c fc > 0, J^Ck < l; 

2) t?F{Xj) < u(Xj) for all Xj > X t ; and 

3) F n (t 2 )-F n (*i) > c T [F(t 2 )-F(ti)] + e n for allti, t 2 G T n with t ± < t 2 . 

We then apply the BH procedure to the above p-values, specifically, Pro- 
cedure 3.1 to Pi. S eq, Procedure 3.2 to Pi )g ib, Procedure 2.2 to Pi imax , and the 
BH procedure to Pi, m ix- For each set of F±, . . . ,Fl and G, we draw 1000 
iid samples of X±, . . . ,X n with n = 5000. In this case, r n < 9.7 x 10~ 3 in 
(4.1); see Appendix A. 4. The power, FDR and pFDR of each procedure are 
calculated by averaging over the samples. Throughout, a = 0.05. 

All the simulations are conducted in R language [13]; Pi tSC q and Pi tg \b are 
computed by the R linear programming package glpk. 

4.2. Results. We conduct 5 groups of simulations. The parameters of the 
simulations are shown in Table 1. 





F U ...,F L 


V\, . . . ,VL 


G 


1 


7V(0,1), JV(-1,1), iV(-2,l) 


.75, .15, .1 


iV(-4,l) 


2 


^20, t20,-l, ^20, -2 


.75, .15, .1 


£20,-4 


3 


JV(0,1), iV(-l,l), N(-2,l) 


.6, .25, .15 


AT(-4,1) 


4 


iV(0,l), JV(-1,1.5), iV(-2,1.5) 


.75, .15, .1 


7V(-4,1) 


5 


N(ji,l), /i = 0,-1, -2, -3, -4 


.65, .15, .1, .05, .05 


AT(-5,1) 



Table 1 

Parameters for the simulations. Ft are null distributions, vu their prior probabilities, and 
G the distribution under false nulls. In each simulation, a = 0.05. t n ,c denotes the 
noncentral t distribution with n df and noncentrality c. 



The results of the simulations are summarized in Table 2. In all the simu- 
lations, the control parameter a is equal to 0.25. As expected, because Pi t mix 
incorporate ui, . . . , ul, which is information not accessible by the other types 
of p-values, they yield the highest power with substantial margin. On the 
other hand, pi tSeq and Pi tg \b yield substantially higher power than pj jmax . 
This shows that even when vi,...,vl are unknown, by utilizing properties 
of empirical processes to reduce overestimation of p- values, the power of the 
BH procedure can still be significantly improved. 

In agreement with known results [1, 15], the FDR attained by using Pi )m i x 
or Pi tma x is close to or lower than (1 — a)a = 0.2375. However, the large 
gap between the FDR by using pj jmax and (1 — a)a indicates that testing 
based on j5j im ax can be very conservative. On the other hand, in all the 
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simul. 




1 










2 






power 


FDR 




pFDR 




power 


FDR 


pFDR 


Pi, scc[ 


.495 


8.61x10" 


2 


8.61x10" 


2 


.236 


8.57xl0 -2 


8.57xl0 -2 


Pi, gib 


.494 


8.60x10" 


12 


8.60x10" 


2 


.235 


8.57xl0~ 2 


8.57xl0~ 2 


pi, max 


.223 


2.55x10" 


2 


2.55x10" 


2 


.035 


2.87xl0" 2 


3.18xl0 -2 


Pi, mix 


.770 


.238 




.238 




.634 


.238 


.238 


simul. 




3 










4 






power 


FDR 




pFDR 




power 


FDR 


pFDR 


Pi, scc[ 


.449 


.103 




.103 




4.82x10 


" 4 6.95x10" 


" 2 .465 


Pi, g ib 


.449 


.102 




.102 




4.82x10 


" 4 6.95x10" 


" 2 .465 


Pi, max 


.229 


3.77x10" 


2 


3.77x10" 


2 


8.42x10 


" 5 1.88x10" 


" 2 .523 


Pi, mix 


.685 


.236 




.236 




.144 


.226 


.259 



simul. 


power 


5 

FDR 


pFDR 


Pi, scq 


4.53X10" 2 


6.42xl0" 2 


6.85xl0 -2 


Pi, gib 


4.62xl0~ 2 


6.51xl0" 2 


6.94xl0" 2 


Pi, max 


3.22xl0~ 3 


1.68xl0~ 2 


4.00xl0~ 2 


Pi, mix 


.448 


.239 


.239 



Table 2 

Performance of the BH procedure applied to different types of p- values in simulations 
1-5. In each simulation, the control parameter is set at a = 0.25. 



simulations, the FDR attained by using Pi tScq or Pi, g \b lies between the above 
two, substantially lower than the first one but substantially higher than 
the second. Together with the simulation result on power, this shows that 
multiple testing based on Pi^seq and p^gib is more conservative than based on 
Pi,mix, but can be much less conservative than based on pi, ma x- 

The conservativeness of multiple testing based on the p-values other than 
Pi, mix does not necessarily help the control of pFDR. In simulations 1 and 3, 
for each type of p- value, the power is relatively high, implying P{R > 1) ~ 1. 
As a result, the pFDR is almost identical to the FDR. In simulations 2, 4 
and 5, the power yielded by Pi, mSlX is low (< .05), and, consistent with this, 
the pFDR is substantially higher than the FDR. In contrast, in simulations 2 
and 5, by using pi tSeq or Pi tg \b, the pFDR and FDR are similar to each other. 
The worst case is simulation 4, where the pFDR is almost twice as high as 
the control parameter a = .25 when pi tScq or Pi !g \b are used. Observe that 
in simulation 4, negative observations with large absolute values are more 
likely to be associated with true nulls than with false nulls. This explains 
the poor control of the pFDR by the BH procedure using pi jScq or f>j )g ib- 

To see in more detail why pi, seq and Pi, g ib in general yield better multiple 
testing results than pi <ma , x , we compare the plots of the p-values. Because 
all the procedures in the study are variants of the BH procedure, it is more 
informative to compare the plots of pr^ /(i/n) = npu\ / i than to compare 
those of i = 1, . . . ,n, where p^ is the ith smallest p- value of a given 
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type. Figures 1 display the plots of npu\/i versus i/n in the simulations, 
where pu\ is the average over the repetitions. The figure clearly shows that 
for small i/n, npu\ seq /i and np^^/i are similar to each other, both are 
substantially lower than np(i\ max /i, and both increase more rapidly than 
n P(«),mixA- This is consistent with the observation that multiple testing using 
Pj jSeq and that using Pi )g ib perform similarly in terms of power, FDR and 
pFDR, and in general both have higher power than multiple testing using 
Pi,m&x at the same value of a. 

We next look at how pi tSeq and Pi )g \h are computed by linear programming. 
For each pi, se q or Pi tg \b, denote by ci^, . . . , cl,i the values of coefficients that 
yield p- values under the corresponding constraints. After the p- values are 
sorted, let c^m be the values corresponding to P(i), se q or fJ(j) )g ib- We plot 
c k,(i) versus i/ n for k = 1, . . . , L. Figure 2 shows the plots for simulations 1 
and 5. The plots for the other simulations are qualitatively similar. As can 
be seen, although Pi^seq 

and f>i lg ib in the simulations are similar, this is not 
the case for the corresponding coefficients j. For each k, when i/n is small, 
Cfc^j) for the two types of p- values are similar. However, as i/n increases, to 
compute Pi,seq, essentially only one stays nonzero. In all the simulations, 
this unique c& is associated with the last null distribution of the null, i.e., 
Fl, which also has the smallest sup- norm distance from G among all Ff,. 
In contrast, to compute Pi )g ib, more complicated combinations of ci, . . . ,cl 
are picked. This difference between the coefficients for Pi jS cq 

and p i;g ib may 

be partially attributed to how linear programming is implemented by the 
package used. However, it also indicates linear programming may not yield 
consistent estimation of c\, . . . ,cl. 

Note that in Figure 2, for small i/n, the sum of cj. (i) is quite smaller 
than 0.4. Since a = 1 — J2°k, this would imply the fraction of false nulls 
could be as high as 0.6, which is improbable in many cases. This raises the 
possibility that, by imposing some constraint on the sum of Ck, the power 
may be improved. Recall that a = 0.05 in the simulation study. We simulate 
the scenario where it is known that o <^ 0.1. For both pj iS eq and Pj jg ib) the 
first constraint on c±, . . . , cl is expanded to become 

1') c k > 0, 0.9 < Ecfc < 1. 

Denote the p-values computed with the expanded linear constraints by 
p' ijSeq and p\ glb , and those computed previously still by pt tS eq and Pi, g ib- In 
Table 3, the power and pFDR of the BH procedures when applied to the 
p-values are compared. In all the cases, the FDR is substantially lower than 
(1 — o)a = 0.2375 and hence not shown. In place of FDR, the standard 
deviation of over 1000 repetitions is reported. Recall R is the number 
of rejections, V that of false rejections, n = 5000 is the total number of 
nulls, and N is number of true nulls. In simulations 1-3, there is a small but 
significant increase in power by using p' i>seq and p' ig i h - This is not the case 
in simulations 4 and 5, where the power is very low for all the 4 types of 
p- values. 
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simul. 




1 






2 






power 


V 71 — N f 


pFDR 


power 


V n — N > 


pFDR 


Pi, scc[ 


.495 


5.20xl0" 2 


8.61 xlO" 2 


.236 


6.42xl0" 2 


8.57xl0 -2 


Pi, gib 


.494 


5.20xl0~ 2 


8.60xl0~ 2 


.235 


6.42xl0~ 2 


8.57xl0~ 2 


Pi,scq 


.541 


5.25xl0" 2 


.103 


.296 


6.83xl0 -2 


9.94xl0~ 2 


Pi ; glb 


.541 


5.25xl0" 2 


.103 


.296 


6.83xl0 -2 


9.94xl0" 2 


simul. 




3 






4 






power 


V 71 — N ' 


pFDR 


power 


SD(^ 

V n — N 


) pFDR 


Pi, scc[ 


.449 


5.35xl0 -2 


.103 


4.82x10 


" 4 1.76x10" 


3 .465 


Pi, gib 


.449 


5.35xl0" 2 


.102 


4.82x10 


" 4 1.76x10" 


3 .465 


Pi, scq 


.473 


5.38xlCT 2 


.113 


6.26x10 


" 4 1.86x10" 


3 .479 


Pi.glb 


.473 


5.38xl0" 2 


.113 


6.26x10 


" 4 1.86x10" 


3 .479 



simul. 


power 




5 

V n—N 


) 


pFDR 




Pi, scq 


4.53x10" 


-2 


3.38x10" 


2 


6.85x10" 


2 


Pi, gib 


4.62x10" 


-2 


4.53x10" 


2 


6.94x10" 


2 


Pi, scq 


4.65x10" 


-2 


3.32x10" 


2 


6.89x10" 


12 


Pi, gib 


4.65x10" 


-2 


3.32x10" 


2 


6.89x10" 


2 



Table 3 

Performance of the BH procedure applied to p- values computed under different linear 
constraints: pi, soq and Pi, gib are the same as in Table 2, Pi Seq and p' t glb are computed 

with the additional constraint Ci + • • • + cr > 0.9. For each simulation, „ is the 
fraction of rejected false nulls among all false nulls in a repetition. The SD is obtained 

over 1000 repetitions. 



In Figure 3, we compare the plots of np^/i for the p- values. Since all 
rejections occur when i <C n, we only compare the plots with i/n < 0.05. It 
is seen that for small i/n, the plots for Pi tSeq and Pi, g ib are very close to each 
other, explaining why the performances of the BH procedure based on the 
two types of p-value are similar. Likewise, the plots for p' i seq and p- glb are 
very close to each other, and in simulations 1-3, both are significantly lower 
than the plots of Pi iSeq and Pi jg \b, which explains the improved power yielded 
by p'i tSeq and p' ig i h - Finally, comparing Figures 2 and 4, we can see that the 
extra constraint C\ + • • • + cl > 0.9 substantially changes the plots of the 
coefficients. In particular, for p\ seq with i <C n, the linear programming sets 
two coefficients nonzero, as opposed to only one for pi tSeq . 

From the above results, it is seen that the performances of the BH pro- 
cedure based on pi tScq and Pi :g \b are close to each other, even though the 
latter one are subject to more constraints. The reason seems to lie in how 
P(i),seq are computed. The evaluation of P(i) lSe q incorporates the constraints 
imposed by sy) with j > i. For small i, the set of constraints is only differ- 
ent by a small fraction from those that are imposed by the entire set of srjy 
Under regular conditions, constraints imposed by S(j\ with j < i will not 
change the maximization substantially. This implies that for small i, P(i) )Seq 
and f>(i) )g ib are close to each other, as can be seen from Figure 3. Since the 
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BH procedure only reject nulls with small p-values, its performance based 
on either type of p- values will be similar. 

5. MLE for prior probabilities of nulls. Let = {6\, . . . ,9l}- As 
indicated in Section 4, for composite nulls, in general the prior v may not be 
estimated consistently. In this seciton, we consider under what conditions v 
can be estimated consistently. Under the setup in Section 1, suppose each 
Fk has a density and G is absolutely continuous with respect to the 
distribution under true nulls. By the Radon-Nikodym theorem, G has a 
density p(x)v T f(x) with p(x) > 0. Then the data X\, . . . ,X n are iid with 
density 

q(x) = [1 - a + ap(x)]u T f(x). 
Pretending all the nulls are true, the MLE for v is 

n 

i> n = argsup^ln[c T /pQ)], 

where S is a suitable set. Usually, one would choose S = {c € [0, l] L :J2 c k = 
1} because by the definition of prior probabilities, v\. > and v k = 1- For 
the reason described below, we shall make the setting a little more general. 
Still suppose that the distribution under true nulls is a linear combination 
of Ffc. However, now are allowed to be negative. In this setting, it had 
better merely regard as a basis for a set of densities. Then set 

(5.1) S = {c : ci + • • • + c L = 1, c T f > 0}. 

A reason for this choice of S can be seen when density functions under 
true nulls are linearly dependent. In this case, it is desirable to pick a basis 
from them, say /i, . . . , fi, and represent the others as gj = J2k ^jkfk- By 
linear dependence, Xjk can be negative. Let the mixture density under true 
nulls be a f + b T g, with J2 a k + J2 bj = 1 and ak,bj > 0. By representing 
it as u T f, we get vy. = + J2j bj^jk, which can be negative. On the other 
hand, v k = 1 and v j > 0. Therefore, S in (5.1) contains v. 

Recall that if A C its interior is A° = {x : B(x,r) C A for some 
r > 0}, where B(x, r) = {z : \zk — Xk\ < r, k = 1, . . . , d}. By this definition, 
S° = 0. However, regarding S as a subset in {c : J2 c k = 1}, we have 
S° = {c: for some r>0, c + v£S\/v£ B(0, r) with £ v k = 0}. Both S 
and S° are convex. Since S contains all c with > and J2 c k = 1; S° ^ 0. 

Proposition 5.1. Suppose J q\\nfk\ < oo and are linearly 

independent. Let a S (0, 1). If v S S°, then 

v n ^Uv pf k = l for all k. 
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Apparently, if p = 1, then J pf^ = 1. A question is whether nontrivial 
p > satisfying the condition exists. Since / p(fk-fi) = 0, provided G L 2 , 
one might search for p among functions in L 2 that are orthogonal to fk~ fi- 
However, such functions are not always nonnegative. Moreover, oftentimes 
fk L 2 . The construction below avoids these potential problems and seems 
to be general. 

Example 5.1. We only consider how to construct p > that are un- 
bounded on E = {x : v T f(x) > 0}. The general case follows the same 
idea. The main step is to find bounded tpi,... G C(K d ), such that the 
L x L matrix M = (Mj&) is nonsingular, where = J ipifk- Once this 
is done, to construct p, fix <fi > continuous such that / < oo and 
sup x£E (p(x) = oo. Such cj) always exist. By detM / 0, there are unique 
ai,...,0£,£K, such that J2 ^i^ik = 1 — / 4>fk for each k. Then / hfk = 1, 
where h = <fi + J2 a i' l Pi- It is eas Y to see h G C(lR rf ) is lower bounded and 
su Pa;G-E h{x) = oo. Then for c > small enough, p = 1 — c + ch £ C(R ) is 
nonnegative with sup^,^ p{x) = oo and / pfk = I — c + c J hfk = 1. 

To see that ipi, ■ ■ ■ as above exist, recall 

det M = sgn(o-) / / CT ( fe )T/> fe 

where the sum is over all permutations a of 1, . . . , L and sgn(cr) is the sign 
of a. Denote D(x) = det[/i(x fc )]. Since |D(sr)| < E CT life fa(k)(xk), D G L 1 . 
Because /i, • • • , /z, are linearly independent, we claim 

(5.2) £(a; : £>(«) = 0) / 0, 

where £ is the Lebesgue measure. If (5.2) holds, then the characteristic 
function of D is nonzero. Therefore, there are ti,...,<L 7^ 0, such that 
j e i(iia;iH ^lxl)^)^ d x ^ 0. It follows that there are ipk( x ) °f the form 
sin(tfcx) or cos(ifcx), such that detM / 0. 

We use induction to prove (5.2). For L = 2, if D(a;) = a.e., then 
fi{.x 1 )f 2 (x2) = /i(x 2 )/ 2 (cci), a.e. Integrating over x 2 yields = / 2 (£i) 

a.e., contradicting the assumption that f\ and / 2 are linearly independent. 

For L > 2, suppose (5.2) holds for L — 1 linearly independent /j. Now 

L 

D(x) = Yi-^fMMiixi, . . .,xl-i), 

i=l 

where Mi{x\, . . . ,xl-i) is the determinant of the (L — 1) x (L — 1) matrix 
consisting of fi(xk), I ^ i, k = 1, . . . , L — 1. Given xi, . . . , xl-i, D(x) is a 
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linear combination of fi(xi,)- Therefore, if D(x) = a.e., then, by the linear 
independence of fi(x), Mi{x\, . . . , = a.e. for each i = 1,...,L. 

However, this contradicts the induction hypothesis. □ 

6. Discussion. In the article, we have focused on the case of finitely 
composite nulls, where true nulls are only associated with a finite number 
of distributions. Formally, it is straightforward to generalize the constrained 
maximization to the case of infinitely composite nulls. However, usually the 
maximization will involve infinitely many degrees of freedom and it becomes 
unclear how to accommodate this with a finite number of observations. A 
more direct approach might be to partition the set of null distributions 
into a finite number of subsets and use the envelopes of the subsets to 
compute p-values. To be more specific, given a partition 6i, . . . , @l of 6, 
let itfc(t) = supg G @ fc (j)e{t) and l^it) = infe e e fc 4>e{t). Then define, for example, 
M n (t) = sup{c T it(t) : c G A, c T l(t) is dominated by V n (t) up to a small 
margin}. Unfortunately, some of the constraints available to the finitely 
composite case can no longer be used. Another issue is how to select the 
partition. Too coarse partition will only yield loose constraints on c& and 
too fine partition will result in many degrees of freedom. Either way, the 
obtained M n {t) may not be much different from the unconstrained maximum 
probability. 

As is known, FDR control can be realized by the local FDRs [5]. For the 
simple case, the local FDR at x is (1 — a)fo(x) /h(x), where a may be replaced 
with 0, /o is the density under true nulls, and h is the overall density of the 
data X\, . . . ,X n or an estimate of the density. For the finitely composite 
case where the null distributions have densities /i , • • • , /l , we may derive a 
conservative estimate of the local FDR by p(x)/h(x), where 

p(x) = sup{c T /(x) : c G A and c T f < h}. 

Alternatively, if the dimension of Xi is high, then we may work on Sj = s(Xi) , 
with the local FDR defined as p(si)/ f(si), where h is now the overall density 
of s\, . . . , s n or an estimate, while 

p(t) = sup{c T 0(t) : c £ A and c T < h}. 

It is worth pointing out that, unlike the simple case, the BH procedure based 
on M n (t) and the FDR control based on p(x)/h(x) are no longer equivalent. 
The reason is that M n (t) is of the form max c / c T 4>. The density of M n (t), 
if existent, in general is different from max c c T that is associated with the 
local FDR. It remains to be seen how much difference the two approaches 
may have. 

Appendix. In this section, we give proofs of the theoretical statements 
of the article. The Lebesgue measure on M rf will be denoted by t. For any 
nondecr easing function / defined on R and x € K, if A := sup{i : f(t) < 
x} 7^ 0, define f*(x) = supA, otherwise, define f*(x) = — oo. By this 
definition, if / is left-continuous and x £ /(I), then f(f*(x)) = x. 
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A.l. Proofs for Section 2. 

Proof of Proposition 2.1. Since Sj = -co e A for all i, by 

D3 and the random mixture model, the probability of the event is 0, hence 
proving 1). By the right-continuity of Df, 

Si <t Xi e D s for all s > t Xi £ D t , 

yielding 2). By P{ Si < t) = P(Xi G A) = 3 ) holds and 4 ) follows 

from 3) and the random mixture model. To get 5), given t, for any e > 0, 
there is 6 £ 6 such that M(t) < <p e (t) + e. By D3, M(s) > 4> 6 {s) -> <fo(t) as 
s | t, giving M(t— ) + e > M(t). Since M is nondecreasing and e is arbitrary, 
this implies M{t-) = M(t). □ 

Proof of Proposition 2.2. To see that Procedures 2.1 and 2.2 are the 
same, by Proposition 2.1, 

Procedure 2.1 accepts Hi <^=^ Si > r <^=> — — > - Vt > Sj. 

a n 

Because M(t) is nondecreasing and R n (t) is an nondecreasing step function 
that has jumps only at s,, 

M(s) Rn(s) 

Procedure 2.1 accepts Hi <^=^ — > — - — — Vs 7 - > Sj. 

a n 

Taking into account the possibility of ties, it is not hard to see that the 
condition on the right hand side is equivalent to Sj > s^m, which implies 
Procedures 2.1 and 2.2 always reject the same set of nulls. 

By the random mixture model, for Xi under true nulls, the distribution of 
^(AJ is a mixture of those of (pg(s(X)) under Fq, 6 £ Q. By Proposition 

2.1, under Fg, (pg(s(X)) ~ Unif(0, 1). Therefore, for Xi under true nulls, Sj 
are iid ~ Unif(0, 1). 

Procedure 2.2 is the BH procedure applied to M(si). Since M(si) > 
F m (D Si ), under true nulls, P(M( Si ) < x) < P(F m (D Si ) <x)=x. The proof 
then follows from Theorem 5.1 and the comment that follows in [3]. □ 

A. 2. Proofs for Section 3. First, note that for Procedures 3.1 and 

3.2, the number of rejections and that of false rejections are R = R n {T) and 
V = V n {r), respectively. 

PROOF of Lemma 3.1. Let s < t. Then An, s C A n ,t and c T 0(s) < 
c T cf)(t) for any c G A, giving M n (s) < M n (t). Thus M n is nondecreasing. 
Next suppose fa £ C(R) for all i. 

1) Given t, as < t — u <S 1, [u, t) has no point in T n and, almost surely, 
no Sj. Thus A n . u = An,t- Let K = {c £ A : c T <fi £ A n .t}- It is seen that K 
is compact and c T <fi(s) is a uniformly continuous function in (c, s) £ K x T. 
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Then sup c£ K c T (f>(s) is continuous in s, yielding M n (u) — ► M n (t) as u f i. 
Thus M n is left-continuous. 

2) Since M n is nondecreasing, M n has a right-hand limit at every t. It only 
remains to be shown that at every t G" {si, . . . , s n }, M n is right-continuous. 
Now, as < u — t <C 1, [i, u) contains no point in T„ and no Sj, yielding 
«4n,u = «4 n ,t- Then the right-continuity follows from the same argument for 
the left-continuity. □ 

In addition to Lemma 3.1, we need a few lemmas to prove Theorem 3.1. 
For t G I, define u-field 

T t = F(Rn{t-), V n (t-), R n (s), V n {s) : s>t). 

Then {J~t,t S 1} is a backward filtration, i.e., Tt C T s for t > s. 

Lemma A.2.1. Suppose (pi G C(R) for all i. Then for t G R, M n (t) is 
Tt-measurable. 

Proof. It suffices to show that given a > 0, {M n (t) < a} G T t for tel. 
For c G A, c T (£ G C(R) and {c T (£ G Ai,t} = E 1 D E 2 , where 

£a = |c T 0(sj) < F n (sj) + e n for Sj > ij , 

^ = |F n (t 2 )-F n (t 1 ) >c T [0(t 2 )-0(ti)]-e n fori 
2 \ ti G T n n [t, r 2 ] with tx < t 2 J 

Note £?i = {c T 0(s) < R n {s)/n + e n \/s>t with i?„(s) > i? n (s-)}. Since 
Rn(s-) G J^t for s > t, it can be seen that E\ G On the other hand, 
E 2 G Tf Therefore, {c T G An,t} G .Ft- 

Since c T </> G A n ,t implies r T cp G An t t for any r G Q L D A with fj < Cj, 
where Q is the set of rational numbers, M n (t) = sup{r T <fi(t) : r G Q 1, n 
A, r T </> G ^4 n .t}- Notice that r T cp(t) is nonrandom. Then 

{M n (t)<a}= f) {r T Am} G ^. □ 

rGQ^nA s.t. 
r T 0(t)>a 

The next goal is to show r is a stopping time of the backward filtration 
Tt- If supX = oo, then r has to start at oo. To get around this problem, we 
use truncations. Let Ir be as in Procedure 3.1. Given c < supX, define 

t- ~ , fsupT c ifT c /0 

2 C = 2ij PI (— oo, cj, t c =< 

[infZ otherwise 

Lemma A. 2. 2. ^4s c | sup J, t c \ t a.s. 

Proof. It suffices to show r < supX a.s. By definition, r < supX. 
The event {r = supX} implies there are j supX, such that M„(ifc) < 
a[R n {tk)\f l]/n. By Lemma 3.1, M n (tk) — > M n (supX) = 1 a.s. On the other 
hand, [R n {t k ) V l]/n < 1. Therefore, P(r = sup J) = 0. □ 
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Lemma A. 2. 3. Suppose <p k G C(K). Then 1) there is to > infZ such that 
for any c £ I, t c > to, 2) for c G I, t c is a stopping time of the backward 
filtration {Tt , t G (inf X, c] } . 

Proof. 1) Let u(t) := 0i(i) H h ^(t). Then M n (<) < u(t) and 

r c > sup ^ i G inf X, c : < ^ 

L an) 

> t : = sup it G (inf X, c] : — < - \ . 

I an) 

Since 4> k & C(R) and 4> k (t) ~ > as t — > infZ, the set on the right hand side 
is nonempty, yielding to > infX. 

2) By definition, r c is a stopping time of the backward filtration Tt if 
{ T c > t} G T for every t G (inf T, c]. Denote E = {r c > t}. We first show 

(A.1) E = hs G [t,c] such that ^ < ^£^11 

L an) 

The right hand side of (A.l) equals {I c n [t, c] 7^ 0}, which is a subset of 
E. On the other hand, the difference between the two events is 

{r c > t, l c n [t, c] = 0} 

= {l c / 0, J c n [i, c] = 0, supT c > t} 

f M n (t) R n {t)Vl M n (t k ) R n (t k ) V 1 

C < > , dtfc \ t with < 

[an an 

Since by Lemma 3.1 M n is left-continuous, M n (t k ) — > M n (t). On the other 
hand, R n (tk) — ► R n (t—) < R n (t). Thus, the last event is empty and (A.l) 
holds. Note that by similar argument, 

(A.2) M n (r c )/a < [R n (r c ) V l]/n. 

Let A = {M n (t)/a < [R n (t) V l]/n}. Then A C E and A G ^. We next 
show ^ = iUT, where T = f]f =1 UreQn(t,cl r r,fc, with 



M n (r) < i? n (r + 1/fc) V 1 



a n 



Once this is done, by M n (r) G T r (cf. Lemma A.2.1) and R n (r + 1/fc) G 
Tr,fc G C T for any r > t. Then E £ T. 

Note E — A implies t c > t, which in turn implies there are r k G Q with 
t < r k < t c < r k + 1/fc. By 

M n (r k ) < M n (r c ) < i^fc) V 1 < R n {r k + 1/fc) V 1 
a ~ a ~ n n 

r rfei fc holds for all k. Thus E — A C V. 
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It remains to show that T C E. Suppose there are £ Qfl (t, c] with 
-^n( r fc)/a < [Rn( r k + ^-/k) Vl]/n. Then has a subsequence, say, itself, that 
converges to some s G [t, c]. Since M n is nondecreasing and left-continuous, 
while R n is nondecreasing and right-continuous, 

M n (s) < M n {r k ) <E - fi n (r fc + l/fc) VI < fln(s) V 1 
a fc - a ~ k n n 

Therefore s£l c and J c n [t, c] / 0. Thus T C E. □ 

Lemma A. 2. 4. For n>l, denote 

(A.3) T n = {(1 - a)is T cf) G A>,t, Vt G X} . 

Suppose Q is continuous. Then, provided exp(— 2ne 2 ) < 1/2, 

P(T„) > 1 - (1 + |T n |)exp {-2ne 2 } . 

Proof. Since Q is continuous, by the DKW inequality [12], for A > 
and n > 1, as long as exp(— 2nA 2 ) < 1/2, 

P{sup(Q - F n ) > A} < exp(-2nA 2 ). 

By Q(t) = (1 - a)v T <t,(t) + aG(D t ), 

P{(1 - a)u T (j)(t) > F n (t) + A for some i} < exp(-2nA 2 ). 

DKW inequality also implies that, given 

(A.4) P{ sup {[Q(t)-Q(x)\ - [¥ n (t) -F n (x)]} > a1 < exp(-2nA 2 ). 

t>x 



Assuming (A.4) is true for now, it follows that 

< \T n \ exp(— 2nA 2 ). 



Q(t) - Q(ti) >¥ n (t) -F n (ti) + X 
for some ij G T n and t > t{ 



Since - > (1 - a)^ T [0(i) - <j>{ti)] for i > t i} by letting A = e n , 

the Lemma then follows. 

Finally, to get (A.4), let y = Q(x). By quantile transformation, 

sup{[Q(i) - Q(x)] - [¥ n (t) - F n (x)]} 

t>x 

~ £ = sup{s - y - [G n (s) - G n (y)]}, 

where G n is the empirical distribution of Ui = Q{Xi). Since Ui are iid 
~ Unif(0,l), Vi = Ui - y + l{Ui<y} are iid ~ Unif (0, 1) as well and 
£ = sup 0<s<1 _ y [s — G' n (s)], where G' n is the empirical distribution of Vj. 
Applying DKW inequality to £, it is seen that (A.4) follows. □ 
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Proof of Theorem 3.1. By Proposition 2.1, under true Hi, Sj ~ i^ T </>, 
which is continuous and positive on X. As a result, {V(t— )/v T <fi(t), J-f, t G 
X} is a left-continuous backward martingale. 

Fix cel. By Lemma A. 2. 3, r c is a stopping time of {J-t,t G (infX, c]} 
with r c > to > infX. Then u T c/)(t c ) > and V n {r c — )/[f T 0(r c )] is well- 
defined. By the optional sampling theorem (cf. [9], Ch. 1, Thm 3.22), 



E 



Vn{r c -) 
f T 0(r c ) 



E 



K(c-) 



v T (f)(c) 



(1 — a)n. 



Let c | supX. By Lemma A. 2. 2, r c | r. Because V n (T c — ) f V^(t— ) < n, 
<^fc( T c) T ^fc(x) and u T <P(t c ) > u 1 0(to) > 0, by dominated convergence, 



(A.5) 



K(r-) 



(1 — a)n. 



On the other hand, because Q is continuous, by Lemma A. 2. 4, with T % 
defined as in (A. 3), 



E 
= E 
< E 



V n {r) 



Rn(r) 


V 1 


V n {r 


-) 


Rn{r) 


V 1 


V n {r 


-) 


Rn{T) 


V 1 



+ E 

r 



V n (r) - V n (r-) 



Rn{r) V 1 
P(T n )+P{T c n ) + E 



Vn(r) - K(r-) 
Rn{r) V 1 



From (A. 2), M n (r)/a < [R n (r) V l]/n. On the other hand, conditional on 
r rej M re (r) > (1 - a)i/ T 0(r). Thus, by (A.5) 



E 



\ V n {T-) 




[Rn(T) V 1 





P(r r , 



< E 

< E 



aV n (T-)/n 
(1 - cl)v t 4>(t) 

aV n (r-)/n 
(1 - cl)v t 4>(t) 



P(T r 



a. 



By Lemma A.2.4, P(r^) < (1 + |^|) exp(-2ne£). Finally, note that R n (r) = 

implies V n (r) — V n {r— ) = while V n {r) — V n {r— ) > 2 implies at least two 
true nulls have the same value of Sj. Since Sj under true nulls are iid with a 
density, the probability of the latter event is 0. Therefore, V n (r) — V n {r— ) < 

1 {R > 0} a.s. This then finishes the proof. □ 

We next proof Theorem 3.2. For n > 1, define 

r„ = {c£A: c T G A n ] . 
For each r > 0, corresponding to (3.3), define 



r, 



c G A : c T <£(t) < Q(t) +r, 
<3(i 2 ) - <2(*i) > c T [0(t 2 ) - 0(ti)] - r, h < t 2 
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Both T n and T r are nonempty since they contain 0. It is not hard to see that 
T n and T r are convex and closed, with T r being increasing and T = C\ r >o ^V- 
Also, whereas r n are random, T r are nonrandom. 
Observe that each t G 2", 

(A. 6) M n {t) = sup{c T 0(t) : c G T n }, m(t) = sup{c T </>0) : c G r }. 

Because T n is compact, there is a random c(t) G T n , such that 

(A.7) M n (t) = c(t) T cf>(t)- 

As commented after Theorem 3.2, we need to get M n — > m. One way to 
do this is to first get r n — > Tq, which is formalized below. 

Lemma A. 2. 5. Let r > 0. Then under the conditions of Theorem 3.2, 

p{t c r„ c r r ) i. 

Proof. By the assumptions, Q(t) is continuous. Let 

E n = |sup|F n (t)-Q(i)| <e n /2\. 

Then, as in the proof of Lemma A. 2. 4, for n > 1, as long as exp(— ne^/2) < 
1/2, P(E£) < 2exp {-ne^/2}. It is not hard to see that E n implies r C T n . 
As nel oo, P(r C r„) > P{E n ) 1. 

Since c T is supported by and strictly increasing in X, almost surely, as 
n — > oo, the set of under true nulls is increasingly dense in I, and thus so is 
S n — {si, • • • , s n }. Because (fik and Q are continuous distribution functions, 
they are equicontinuous. Given r > 0, fix C > and (5 > 0, such that 

max[0 fc (-C) + 1 - fc (C)] + Q(-C) + 1 - Q(C) < r, 

k 

max|^ fc (s) - <j) k (t)\ + |Q(s) - Q(t)\ < r, if |s - t\ < 5. 

k 

Let E' n = {5{S n , [-C, C]) + 5(T n , [-C, C]) < 5}. Conditional on E n n E' n , 
if t G [— C, C], then |Q(i) — F n (4)| < e n and there is s, with |t — Sj| < 5. Let 
c G T n . By c fc > 0, ci H h c L < 1 and c T 0(sj) < F n (sj) + e n , 

c T <f>(t) < c T 0(sj) +max|^ fe (t) - 0fc(s»)] 

A: 

< F n (si) + e n + r 

< Q(*i) + 2e n + r 
<Q(t) + 2e n + 2r. 

If t < -C, then c T 4>(t) < max0 fc (-C) < r < Q(t) + r. If t > C, then 
c T (f>{t) < 1 < Q(t) + r. In any case, c T 0(t) < Q(t) + 2e n + 2r. 

Similarly, for t\ < t2, it can be shown that c T [cf)(t2) — <fi{ti)] < Qfo) ~ 
Q(h) + 3e n + 4r. As a result, c G T CT , with cr = 3e n + 4r. Then E n n ^ C 
{T n C r CT }. Because e n — > 0, P(E n fl E 1 ^) — > 1 and r is arbitrary, the proof 
is complete. □ 



Z. CHI/FDR CONTROL FOR COMPOSITE NULLS 



24 



Lemma A. 2. 6. Suppose a < 1. Then, under the conditions of Theorem 
3.2, asn^oo, P{M n G C(R)) -> 1 and sup |M n -m| ^ 0. AZso, m G C(R). 

Proof. Because each ^ is bounded, nondecreasing and continuous, is 
uniformly continuous on R. Since L n is compact, c (j)(t), c G L family 
of functions in f are equicontinuous and uniformly bounded. It follows that 
M n G C(R). Likewise, since Lo is compact, m G C(R). 

Given a > 0, since F T is compact and L r j Lo as r j 0, there is r > 
such that for all c G F r , d(c, To) < c. Conditional on Lo C L n , by (A. 6), 
m(t) < M n (t) for all i. On the other hand, conditional on L n C L r , for any 
t, there is c (t) G L such that \c(t) — c (t)\ < a, where c(t) is defined as in 
(A.7). Lhen 

\M n (t) - c o (t) T 0(t)| < \c{t) - e (f)| 1 0(f) | < Via 
=> M n (t) < c (t) T c()(t) + v^a < m(t) + v^Lct. 

Lhus, {L CLC L r } C {0 < M n (i) - m(t) < Via all t}. Because a is 
arbitrary, by Lemma A. 2. 5, sup \M n — m\ — > 0. □ 

Proof of Theorem 3.2. The proof follows closely the one in [6]. By 
Assumption A and the continuity of m and Q, for any < e <C t* — to, 

5 = mini inf \aQ(t) - m(t)}, inf [m(i) - aQ(t)l > > 0. 
\te(t +e,U-e) t>U+e 

p 

Let Q n {t) = [Rn{t) V l]/n. As n — > oo, because sup|Q n — Q\ — > and 
P 

sup |M n — m| — ► 0, the probability that 

min J inf [aQ n (t) - M n {t)}, inf [M n (t) - oQ„(t)] 1 > 5/2 

[te(t +e,t*-e) t>t»+e J 

p 

tends to 1, implying P(\t — t*\ < e) — ► 1. Therefore, r — > f*, which leads 
to the last claim of the theorem. Since t* > infT and c T cf>(t) is strictly 
increasing, Q(t*) > (1 — a)is T 4>(t*) > 0. By the Week Law of Large Numbers 
and dominated convergence, 



FDR = E 



Vn(r)/n 
Qn{r) 



1 - a)i>- T 0(f*) m(i* 



— 7T? — 7 = a ' 



where the last equality is due to the continuity of m and Q at i*. □ 

A. 3. Proofs for Section 5. We need two lemmas for the proof of 
Proposition 5.1. 



Lemma A. 3.1. Suppose fi, ■ ■ ■ , /l are linearly independent. Then S in 
(5.1) is a convex compact set. 
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Proof. It is easy to see that S is convex and closed, so it suffices to 
show S is bounded. Suppose there are c\ G S with \c{\ — > oo as / — > oo. Since 
Qi + • • • + cik = 1, this implies max^ q& — ► oo and min^ c/fc — > — oo. There is 
a subsequence of q and a partition of B into , . . . , and {0^ , . . . , 0j t }, 
with r > 0, t > and r + i = L, such that cu s > and cy g < for each in 
the subsequence. Without loss of generality, assume q$ > for i = 1, . . . , r 
and Qj < for i = r + 1, . . . , L. Denote du = — q r _|_j for i = 1, . . . , t. Then 
for every x, 

r t r t 

Y c ikfk{x) > Y d lk f r+k (x), ^ ci k = 1 + Mi, with M[ = 4fc- 

k=l k=l k=l k=l 

Divide both sides of the inequality by Mi and let I — > oo. Since Mi — > oo, 
there is a sequence of / along which (c/i, . . . , q r ) T /Mi and {du, . . . , dn) T /Mi 
have limits, say (u±, . . . , u r ) T and (yi, . . . , vt) T ■ Then 

r t 

u kfk{x) > Y v kfr+k(x), all x. 

k=l k=l 

It is easy to see that u k > 0, v k > and J2 u k = J2 v k = 1- Because 
the integrals of both sides are equal to 1, equality must hold. As a result, 
fi, . . . , fi are linearly dependent, which is a contradiction. □ 

Lemma A. 3. 2. Suppose J q\ \nf k \ < oo for all k. 

1 ) . For c G S° and r > 0, if c + v G 5 Vt> G B(0, r) with Y, v k = 0, then 

(A.l) c T /(x) > r[M f (x) - m f (x)], all x, 

where Mf(x) = max^ f k (x) and mf(x) = min k f k {x). 

2) . For any cG S°, ln(c T /) G L l {Q). 

3) . Let £(c) := Jq ln(c T /) . Then £ G C(S°) . 

4) . For any c G S° and x, c T f(x) = <^=^ f{x) = 0. 

5) . If f i, ... , fi are linearly independent, then £ is strictly concave in S°. 

PROOF. 1) For any v G B(0, r) with J^Vk = 0, by (c + v) T f(x) > 0, 
c T /(x) > - Y.Vkfk{x). Let v k = -r if k = mm{i : fi{x) = M f (x)}, v k = r 
if k = min{« : fi(x) = mf(x)}, and v k = otherwise. Then (A.l) follows. 

2) Let t+ = t V and t~ = (-t) V 0. By Lemma A.3.1, £ and = 
Xicj + 1 are bounded on 5. Fix A G (0,1) such that (1 — X)J2 c k 
on S 1 . If M/(x) > m f (x)/\, then by (A.l), c T /(x) > r(A~ 1 - l)m/(a;). If 
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M f (x) < m f (x)/X, then 

c T f(x) = 4fk{x) - %f h (a 



i 



> rrif/2. 



Thus, there is a constant k > such that c T /(x) > k(t A 

On the other hand, e T /(x) < Sc^iWV(x) < n'Mt{x), where k' < oo is 
another constant. As a result, 

(A.2) |ln[c T /0)]| <max(\ln[n'M f (x)}\, |ln[«(r A l)m/(x)]|) 

Then by ln/ fc G L^Q), ln(c T /) G L 1 (Q). 

3) Follows from (A.2) and dominated convergence. 

4) If c T /(x) = 0, then by (A.l), Mf(x) = m/(x) and hence /fc(x) are all 
equal. As a result, /fc(x) = c T f(x) = 0. 

5) For ci, c 2 G S" and 6* G (0, 1), since S° is convex, c := (1 - 0)ci + #c 2 G 
S°. Because Inz is strictly concave on (0, oo), (1 — 9)£(ci) + 6£{c2) < £(c), 
with "=" •<=>■ cj f(x) = cj f(x) for x with g(x) > 0. On the other hand, 
if q(x) = 0, then c T f(x) = and by 4), f(x) = 0. Therefore, "=" implies 
cj / (x) = cj f(x) for all x. Since ff. are linearly independent, it follows that 
"=" c\ = C2- Therefore, I is strictly concave. □ 

Proof of Proposition 5.1. By Lemma A. 3. 2, for c G S° and X ~ Q, 
ln[c T f(X)] G L 1 , so by the Weak Law of Large Numbers, as n — > oo, 

i ~r P 

n ~ Sf=i hi[c /(Xj)] — ► ^(c). Since 5 is compact and £ is continuous and 
strictly concave on S°, by standard argument, if £ has a maximum point in 
S°, then the point is unique and i> n converges in probability to it. Thus, to 
finish the proof, it suffices to show that v is the maximum point of £{c) if 
and only if J pf k = 1. 

Let 7T be the map c — > (ci, . . . , cl_i) t and dk(x) = fk{x) — Jl(x), k < L. 
Since c\ + • • • + cl = 1 for c G S, then 

L-i 

c T f(x) = f L (x) + £ cfc[/ fc (x) - /l(x)] = / L (x) + ^(c) T d(x). 
fc=l 

Denote h(u,x) = /l(x) + ii T d(x) and H(u) = J q(x) In h(u, x) dx. Then 
£(c) = H(ir(c)). Since £ is strictly concave in S°, so is H on r°, with T = 
tt(S) = {u : /i + u T d > 0}. Note that 7r : 5 -> T is bijective with 7r _1 (u) = 
(u, 1 — ^2 u k) an( i ^{S ) = r°. It remains to be seen that H is differentiable 
in r°, with 

9H{u) /"^(x) fc = 1; ... )L _ L 



7 x 
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Once this obtains, by the strict concavity of H and (v\, . . . , vl—i) G r°, 

v is the maximum point of £ 
<^=^ {y\, . . . , vl-i) is the maximum point of H 

q{x)dk{x) dx = 



I/T/(x) 

1 - a + ap(x)][/ fc (x) - = C 

y p/fc = J pSl J p/fe = 1 all fc, 



where (a) is due to a > and / f k = 1 and (b) is due to the fact that / p/^ 
being all equal implies each being equal to / pu T f = 1. 

Given u 6 r°, fix r > such that 5(-u., 2r) C r°. It is not hard to see 
that there is a > 0, such that for any v G B(u,r), 7r~ 1 (?;) + w G S 10 , 
Vio G 5(0,20-) with £> fc = 0. Then by (A.l), 

(A. 3) h(v,x) > a[Mf(x) — mf(x)], all v G B(u,r) and z. 

For x with > by Lemma A. 3. 2, h(u + v,x) > 0, \/v G 5(0, r). 
Therefore, ]n[h(u + v,x)/h(u,x)] is well-defined and by Taylor's expansion, 



, h(u + v,x) 

m T7 s = / 

h{u,x) ^ 



F d k (x)v k d k (x) 2 vl 



h(u,x) 2h(u + zv,x) 2 



for some z = z(v, x) G [0, 1]. As m + zi; G B(u, r), by Lemma A. 3. 2, h(u + 
zv,x) > and by (A. 3), /i(u + zv,x) > a[Mf(x) — mf{x)\. On the other 
hand, |dfe(x)| < Mf(x) —mk{x). Thus \d k (x)/h(u + zv,x)\ < 1/a. Likewise, 
\d k {x)/h{u)\ < 1/a. As a result, 



H(u + v) - H(u) = J q(x) In 



L-l 



h(u + v,x) 

~u ^ dx 

n[u, x) 

q{x)d k (x 



which finishes the proof. □ 
A. 4. Proofs for Section 4. 



Proof of (4.1). Recall that the overall distribution under true nulls is 
J2k=i v kF k and the distribution of X±, . . . ,X n is Q = (1 — a)u T F + aG. 
Then (1 — a)u T F(Xi) < Q(Xi). By the assumption, Q is continuous, which 
implies that Q(Xi) are iid ~ Unif(0, 1). Then for the rank statistics Xn*, < 
X(2) < ■ ■ < X( n ) , 

(Q(X (fc)) , 1 < k < n) ~ ( g f + + ; ; , 1 < k < n) , 
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where £1, . . . , £ n +i are iid with density e x l {x > 0}. By exponential inequal- 
ity for e (0, 1), P(& + • • • + £ n+1 < /3(n + 1)) < (/3e 1 -' s ) n+1 . Therefore, 
for each k < a n , 

F(Q(X w )<f*(l/n;i,l//3 



,2^1=1 



^■<(l/n)^(l/n;i,l//3) 



/ k n+1 \ 

>P £&</? 5 *(l/n;i,l//3), J> > /3n 

Vi=l i=l / 

> ^ E & < 9*(l/n; i, - P (Y; & < /3n) 

Because X)i=i £i follows the Gamma distribution with shape parameter 
k and scale parameter /3, by above inequalities yield 



P (Q(X (i) ) < f *(l/n; i, 1//?)) > 1 - - - (J3e x ~f i ) n+1 . 
As a result, 

p((l - a)is T F(X {i) ) < F*(l/n;i,l/P), all i < a r 

> p(q(X (S) ) <T*(l/n;i,l/p), alH < a n ' 
"1 

Following the proof of Theorem 3.1, 

1{R > 0} 



> 1 - a n 



FDR < a + E 



RV1 



2(l + |T n |)exp(-2ne 2 )+a n 



l-/3\n+l 



n 



Note fie 1 P < 1. With e n = y/lnn/n and |7^| = |_(hin) 2 J, it is easy to see 
r n — > as n — > oo. Furthermore, for a n = ?i 0,2 , /3 = 0.95, and n = 5000, 
r n ~ 9.64 x 10 -3 . □ 
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Simulation 1 Simulation 2 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Simulation 3 Simulation 4 




Simulation 5 




Fig 1. Plots of np^/i versus i/n in simulations 1-5 for different types of p-values: pi,, 
("lp-sequential"), p ijg i b ( "lp-global" ) , Pi, max ("max"), and p ijmix ("mix"). 
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Fig 2. Plots of c^m versus i/n, k = 1, . . . , n in simulations 1 and 5, where Ci ; m , . . . , c& m 
are the coefficients to attain P(j) >se q (left) or P(;) ;g ib (right). 
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Simulation 1 Simulation 2 
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Simulation 3 Simulation 4 
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D lp-sequential 
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Simulation 5 

1.5n 
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01 1 1 1 1 1 

0.01 0.02 0.03 0.04 0.05 



Fig 3. Plots of np(i)/i versus i/n in simulations 1-5, with i/n < 0.05. The plots with 
open symbols are those of pi, se q and Pi, g ib as in Figure 1. The plots with closed symbols 
are those of p;, ae q and Pi, s \b computed with the extra constraint ci + • • • + cl, > 0.9. 
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Simulation 1 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Simulation 5 




Fig 4. Plots of cwo versus i/n, k = 1, . . . ,n in simulations 1 and 5, where Ci (o, . . . ,Cl,(i) 
are the coefficients to attain p'/^ scq (left) or p'^ glb (right), under the constraint ci + ■ ■ ■ + 
cl > 0.9 in addition to those for P(i), seq and P(i), g ib in Figure 2. 



